Hypercube topology based advanced search algorithm

ABSTRACT

The present invention is a system and method of conducting an adaptive search from a plurality of data sources utilizing a hypercube topology. The system includes a search engine which utilizes a hypercube architecture having a plurality of hypercubes. Each hypercube indexes several data sources in a manner such that similar data sources are located in proximity with other similar data sources. In addition, the search engine utilizes a plurality of message passing ants providing a signal of a path taken for other message passing ants to follow.

RELATED APPLICATIONS

This application is a continuation of a co-pending U.S. patentapplication Ser. No. 10/899,982 by Srik Soogoor entitled “ADVANCEDSEARCH ALGORITHM WITH INTEGRATED BUSINESS INTELLIGENCE,” filed Jul. 27,2004 which claims the priority of U.S. patent application Ser. No.10/899,694 by Srik Soogoor entitled “Hypercube Topology Based AdvancedSearch Algorithm,” filed Jul. 27, 2004 and is hereby incorporated byreference herein.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to searching services. Specifically, the presentinvention relates to an advanced search algorithm for use in a networkedenvironment.

2. Description of the Related Art

Tremendous advances have been made in providing web services to bothconsumers and business enterprises. With the increased use of theInternet to transfer information between companies and consumers, thetask of organizing and utilizing this information is daunting. Today,business enterprises utilize a real-time business intelligence forprocessing this information. Existing business intelligence may beconsidered a “data refinery.” In a similar manner as oil refineries areused to convert a raw material (oil) into several products (e.g.,gasoline, jet fuel, kerosene, and lubricants), real-time businessintelligences take another raw material (data) and process it intoseveral products for consumers and enterprises in real-time.

Although the existing business intelligence systems manage some forms ofdata very well, the management of both structured and unstructured datais beyond their capabilities. A business intelligence, and morespecifically, an adaptive searching algorithm is needed which canprocess both structured and unstructured data in an efficient andmeaningful manner is needed.

Thus, it would be a distinct advantage to have a searching algorithmwhich can efficiently and accurately process both structured andunstructured data. The algorithm should be adaptive and used inconjunction with business intelligences of various business enterprises.

SUMMARY OF THE INVENTION

In one aspect, the present invention is an adaptive searching system.The system includes a search engine for receiving and processing searchqueries. The search engine utilizes an adaptive search algorithm. Thesystem also includes at least one interface device for communicatingwith the search engine. The interface device provides a communicationlink between a user providing a search query to the search engine. Inaddition, the system includes a plurality of indexed data sources. Thesearch engine utilizes a plurality of message passing ants. Each messagepassing ant searches the indexed plurality of data sources to answer thesearch query. The message passing ants also deposit a signal of a pathtraversed. Other message passing ants may then follow the path byfollowing the signals deposited by other message passing ants.

In another aspect, the present invention is an adaptive searchingalgorithm responding to a search query from a user through an interfacedevice. The algorithm includes a search engine for receiving andprocessing search queries. In addition, a plurality of data sources isindexed. In addition, the algorithm uses a plurality of message passingants. Each message passing ant provides a signal of a path followed insearching the plurality of data sources in response to the search query.Other message passing ants may then follow the signal deposited by amessage passing ant while searching the plurality of data sources.

In still another aspect, the present invention is a method of adaptivelysearching a plurality of data sources within a network. The methodbegins by indexing the plurality of data sources. Next, a search queryis sent by a user to a search engine. Message passing ants are then sentto the data sources searching an answer to the search query. Eachmessage passing ant deposits a signal to indicate a path traversed bythe message passing ant during its search. Other message passing antsmay then follow the path taken by previous message passing ants. Aresponse to the search query is sent by at least one message passing antsearching the plurality of data sources to the search engine.

In another aspect, the present invention is a searching algorithmproviding an indexed hypercube topology. The searching algorithmincludes a plurality of data sources. The algorithm also includes aplurality of cubes. Each cube has a plurality of nodes associated withthe data sources. The data sources are indexed and positioned inproximity to another data source based on a similarity of information ofthe data sources.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a simplified block diagram of a web service system in thepreferred embodiment of the present invention;

FIG. 2 illustrates a topology of a hypercube used for indexing data onthe various nodes of the system in the preferred embodiment of thepresent invention;

FIG. 3 depicts a 4-layered 4-cube hypercube topology in the preferredembodiment of the present invention;

FIGS. 4A and 4B are flow charts outlining the steps for conducting asearch within the system according to the teachings of the presentinvention;

FIG. 5 is a flow chart outlining the steps for conducting the adaptivesearch algorithm according to the teachings of the present invention.

DESCRIPTION OF THE INVENTION

An adaptive search algorithm system and method are disclosed. FIG. 1 isa simplified block diagram of a web service system 10 in the preferredembodiment of the present invention. The system includes a plurality ofinterface devices 12, 14, and 16. The interface devices may be anycomputing or communication device communicating in the system 10. Theinterface devices may be mobile phones, personal data assistants(pda's), laptops, computers, etc. The interface devices are operated byconsumers or users of the system 10. Within the system 10 is a searchengine 18 and an indexing server 20. The system 10 incorporates theWorld Wide Web (Internet) 22 with the other components of the system. Inaddition, the system includes a data discovery router 24, a businessprocess and rules engine 26, a business intelligence engine 28, atransaction monitor 30 and a meta mapper 32. A corporate database group34 comprises a plurality of corporate databases 36, 38, 40, and 42. Thevarious components of the system 10 may reside in one or more computingsystems, such as servers or other computer workstations. Additionally,some or all of the components may include a computer processor andmemory as needed to perform the functions within the system 10.Preferably, the business intelligence engine, business process and rulesengine, the transaction monitor and the meta mapper all are associatedwith a specific business enterprise running one or more corporatedatabases. The corporate databases preferably reside at a site separatefrom the search engine, indexing server and data discovery router.Alternatively, the corporate databases may reside with one or more ofthe other components of the system 10. The transaction monitor providesa monitoring function between any message sent or received from thecorporate nodes (databases). The meta mapper provides a virtual databaseof all the corporate databases associated with a specific businessenterprise.

The search engine is the gateway for all searching requests from theusers of the interface devices 12, 14, and 16 to the system 10. In thepreferred embodiment of the present invention, the interface devices areembedded within their computing systems with a search engine footprint.When a user logs in with the system 10 for the first time, a web servicerequest is activated and ready to make a request. Preferably, the searchengine footprint is a program occupying a small amount of memory withineach interface device's computing system. The search engine footprintmay include memory holding user preferences to assist in the searchingrequests of the user.

When a search request is made by a user through the interface device, aweb service request is sent to the data discovery router 24 via thesearch engine 18. The data discover router 24 determines where the webservice request needs to be routed, such as the Internet 22, thecorporate databases 36, 28, 40, 42, or other sources. Once the datadiscovery router determines where to send the web service request, anumber of background queries are generated and sent. The primary queryfor the web service request is the source that most closely matches thedata discovery router's determination.

In the event that the data discovery router's recommendation is to acorporate database, then the business intelligence engine 28 isactivated. The business intelligence engine processes the requests basedon the business process and rules engine 26's configuration and rulessetup. For example, the business process and rules engine may providerules for a plurality of consumers. A consumer may be provided with aspecial discount if the consumer spent a specified amount of money inthe previous year. The business intelligence engine is a platform thattakes the output of the business process and rules engine and presentsthe necessary solution for use in the search engine and processing thesearch requests.

The search engine 18 is adaptive and utilizes a novel concept known asan ant colony optimization algorithm in a hypercube topology basedenvironment. The search engine optionally adapts itself to the user'sprofile. However, a profile setup is not mandatory for a user to use thesystem. In the preferred embodiment of the present invention, the user'spreferences are provided in initial setup through the search enginefootprint of the user's interface device.

The search engine is preferably located with a computer server wellknown in the art. However, the search engine may be located in anycomputing system allowing communication through the system 10. Thesearch engine includes a capability to perform a generic search, apersonal search, a corporate database search and receipt of sponsoredadvertisements.

In order to facilitate the enhanced searching capabilities of the searchengine 18, a novel architecture is utilized. The search engine uses webcrawler bots to traverse the web to create an index of all the websites.This indexing is performed prior to any search request. These websitesunder meta data are grouped in a n-layered hypercube topology with thelongest distance between any two points being no more than log(n) base 2nodes. FIG. 2 illustrates a topology of a cube 50 surrounding a cube 54used for indexing data on the various nodes of the system (e.g.,servers) according to the teachings of the present invention. As the webcrawlers traverse the Internet, more daisy chained hypercubes topologymay be built (see FIG. 3). Vertices 52 (“point or node”) of thehypercube 50 represent an indexed search data point. The data points ordata sources may be web pages, meta data or a combination of both. Linesdepicted between the data points show pathways. One node from one cubeis connected by a pathway 57 in an adjacent cube. The indexing server 20preferably operates using the Linux operating system and use Intelprocessors. However, any processor and operating system may be used. Theindexing server provides an index of all the data sources found by theweb crawler bots.

A hypercube is a cube with more than three dimensions. A single (2̂0=1)point (or “node”) may be considered as a zero dimensional cube, two (2̂1)nodes joined by a line (or “edge”) form a one-dimensional cube, four(2̂2) nodes arranged in a square form a two dimensional cube and eight(2̂3) nodes form an ordinary three dimensional cube. Following thisgeometric progression, the first hypercube has 2̂4=16 nodes and is a fourdimensional shape (a “four-cube”). An N dimensional cube has 2̂N nodes(an “N-cube”). To make an N+1 dimensional cube, two N dimensional cubesare joined at each node on one cube to the corresponding node on theother cube. A four-cube may be visualized as a three-cube with a smallerthree-cube centered inside it with edges radiating diagonally out (inthe fourth dimension) from each node on the inner cube to thecorresponding node on the outer cube.

Each node in an N dimensional cube is directly connected to N othernodes (e.g., pathway 57). Each node may be identified by a set of NCartesian coordinates where each coordinate is either zero or one. Twonodes are directly connected if they differ in only one coordinate.

The simple, regular geometrical structure and the close relationshipbetween the coordinate system and binary numbers make the hypercube anappropriate topology for a parallel computer interconnection network.The fact that the number of directly connected, “nearest neighbor”,nodes increases with the total size of the network is also highlydesirable for a parallel computation. The proximity of the data pointsis defined during the mapping process by specifying, through theindexing server 20, indexing definitions. The definitions define theproximity of the information found.

FIG. 3 depicts a 4-layered 4-cube hypercube topology in the preferredembodiment of the present invention. FIG. 3 illustrates a hypercubearchitecture 70 having a plurality of cubes 50 and 52. The hypercubearchitecture is fully distributed and utilizes Message Passing Interface(MPI). MPI is implemented by use of “ant colony optimizations.” Antcolony optimization is an evolution-based search technique for thesolution of difficult combinatorial problems. The ant colonyoptimization follows the analogy of ants, which leave a pheromone trail.It should be understood that the layers of cubes as well as the numberof cubes may vary depending on the search and amount of data sourcesavailable.

These ants, unlike the web crawlers, possess the MPI and are known asMespa's (message passing ants). The Mespas use memory to store partialsolutions. The Mespas live in a discrete world, which provides forindependent operation of each Mespa with an awareness of other Mespas.The Mespas have heuristic information and may perform a local search.Additional, the Mespas have a limited intelligence allowing a look aheadcapability. The Mespas follow the trails as depicted on the hypercubetopology (lines between vertices 52). The Mespas deposit an analogouspheromone which is problem dependent and a function of the solutionquality. The analogous pheromone is a signal deposited by each Mespaproviding a trail for other Mespas to follow. As more Mespas traversethe trail, the pheromones (signals) deposited become stronger.Therefore, once a plurality of Mespas traverse a path, other Mespas willfollow. This follows the analogy of a colony of ants which, at firstsends a few ants to scout ahead for food. Once several ants follow aspecific path to a food source, other ants follow the pheromones on thetrail and are led to the food source.

The algorithm for searching within the plurality of hypercubes includesseveral assumptions. The algorithm assumes that there is a web crawler(Mespa) that is both scalable and incremental. The hypercubes keep alocal copy of the web pages with the meta data in a repository which iseventually used for indexing, mining and personalization. Each node ofthe hypercube topology includes a set of information on a particular webpage. These nodes of the web pages have been built using the concept ofproximity cluster. The distance from one node to the next node or anyother node signifies the “proximity” or “closeness” of those two webpages.

Each hypercube (or plurality of cubes) is assigned at least one webcrawler (Mespa). Also a scoutmaster is utilized to determine which Mespagoes to which hypercube and start a search. The scoutmaster isultimately responsible for the search result. A scoutmaster 56 isdepicted on FIG. 3. The position and the number of scoutmasters isexemplary only and may be varied. In addition, a plurality of Mespas 58are also depicted on FIG. 3. The Mespas traverse the paths between eachnode and search the various data points.

For each Mespa K, the probably of p(k, t, w) of moving from node t tonode w depends on the combination of two values: the attractivenessn(t,w) on the hypercube of the move, as computed by some heuristicindicating the a priori desirability of the move and the trail level tl(t, w) on the hypercube of the move, indicating how proficient it hasbeen in the past to make that particular move. This represents aposteriori indication of the desirability.

Trails are preferably updated when the Mespas have completed theirsearch, increasing or decreasing the level of trails corresponding tomoves that were part of “good” or “bad” search, respectively.

The algorithm includes a tabu list [L] of all the Mespas (inactivelist). A randomly selected Mespa is sent to the hypercube 50 for thenext search request from the tabu list. Additionally, a scoutmaster isinitialized. The scoutmaster selects a hypercube for the search. Thescoutmaster initializes p(k, t, w) and n(t,w). Next, the Mespas on aspecific hypercube (e.g., hypercube h), perform a parallel operation.Each Mespa is responsible for a cube c. Next, the probability isdetermined to move into the cube c. The requested search items aresearched amongst the indexed web pages. If any Mespa finds a requesteditem, the Mespa returns an answer to the scoutmaster. If the requesteditem is not found, a message is sent to the scoutmaster that the searchresults were negative. The scoutmaster then terminates the Mespa thatfailed the search. The scoutmaster is informed of this termination. Thesearch continues within other hypercubes.

FIGS. 4A and 4B are flow charts outlining the steps for conducting asearch within the system 10 according to the teachings of the presentinvention. With reference to FIGS. 1-3, 4A, and 4B, the steps of themethod will now be explained. The method begins with step 100 where theuser optionally provides preferences through the search engine footprintembedded within the interface device. The preferences may include anyinformation, which may be helpful in performing a search, such as auser's home address, interests, buying habits, etc. Next, in step 102,the user requests a search through the interface device. The method thenmoves to step 104 where a request is generated from the user's interfacedevice to the search engine 18. In step 106, the search engine generatesa web service request and sends the request to the data discovery router24. In step 108, the data discovery router determines where the requestis to be routed. The data discovery router then generates and sends aplurality of queries through the system 10 in step 110.

The method then moves to step 112 where it is determined if the datadiscovery router recommends accessing the corporate database group 34.If it is determined that the corporate database group should beaccessed, the method moves to step 114 where the business intelligenceengine 28 is activated. Next, in step 116, the business intelligenceengine processes the request based on the business process and rulesengine 26 configuration and rules set. The business process and rulesengine's configuration is setup as desired to provide specified rulesand policies incorporated in the use of the corporate data group 34. Themethod then moves to step 118 where a search is conducted by theadaptive searching algorithm (explained below in FIG. 5).

However, if it is determined that the data discovery router does notrecommend accessing the corporate database group 34, the method movesfrom step 112 to step 118 where the search is conducted by the adaptivesearching algorithm. Next, in step 120, the primary query and resultsdetermined by the search engine is sent to the requesting user'sinterface device.

FIG. 5 is a flow chart outlining the steps for conducting the adaptivealgorithm according to the teachings of the present invention. Withreference to FIGS. 1-3, and 5, the steps of the method will now beexplained. Prior to beginning the search, the various data sources (webpages, meta data, combination of web pages and meta data, etc.) areindexed through the indexing server. The indexing server includes anindexing definitions table which defines information and defines theproximity of data to one another. Therefore, the hypercube topology isin place and fully indexed prior to any search. The method then beginswith step 200 where a user generates a search request through the user'sinterface device. Next, in step 202, the search engine initializes ascoutmaster. During initialization, the scoutmaster selects a hypercube(or plurality of cubes) for conducting the search. In addition, theprobability of moving from a node t to a node w [p(k,t,w)] and theattractiveness of the move [n(t,w)] is initialized. Next, in step 204,the search is conducted. Specifically, all Mespas within the hypercube h(selected hypercube or hypercubes) act in parallel. Each Mespa isresponsible for a cube c (50 or 54). The probability of the state tomove into c is determined. Additionally, each Mespa conducts the searchfor the requested item.

Next, in step 206, it is determined if the requested item has beenfound. If it is determined that the requested item has been found, themethod moves to step 208 where an answer is returned to the scoutmasterthat the requested item has been found. The method then moves to step210 where the search results are sent to the user through the user'sinterface device.

However, if it is determined that the item has not been found by theMespa, the method moves from step 206 to step 212 where the Mespa isterminated. Next, in step 214, the scoutmaster is informed that theMespa has been terminated. Next, the method moves to step 204 where thesearch is continued. Initially, Mespas follow a random route in searchof answers to the search query. As more Mespas traverse specific trailsin the hypercube topology, additional Mespas will follow the trail(attracted to the analogous pheromones). Thus, a trail and erroriterative process is conduct whereby as more Mespas travel a specificpath, more Mespas follow. The search is then focused to those pathshaving the most traffic.

Although the various components of the system 10 are depicted asseparate items, such as the search engine 18 and the indexing server 20,the present invention may include components in one or more locations.Additionally, it should be understood that the hypercube architecture isone structure utilized to perform a search using the novel ant colonyoptimization searching techniques. Any architecture may be implementedto perform the ant colony optimization searching techniques.

The present invention provides many advantages over existing searchsystems. The present invention enables an adaptive search to beconducted which may process both structured and unstructured data. Inaddition, the user's preferences may be incorporated into the searchrequest automatically. For example, if a user desires the location of aspecific type of restaurant, the search may automatically be conductedof restaurants within a certain radius of the user's home address. Inaddition, the corporate databases may be utilized by providing specificitems of interest to the user, such as sales on particular items (e.g.,children's clothes). In addition, the searching algorithm enables asearch to be conducted which learns from past searches by incorporatingthe “ant colony optimization” techniques discussed above.

While the present invention is described herein with reference toillustrative embodiments for particular applications, it should beunderstood that the invention is not limited thereto. Those havingordinary skill in the art and access to the teachings provided hereinwill recognize additional modifications, applications, and embodimentswithin the scope thereof and additional fields in which the presentinvention would be of significant utility.

Thus, the present invention has been described herein with reference toa particular embodiment for a particular application. Those havingordinary skill in the art and access to the present teachings willrecognize additional modifications, applications and embodiments withinthe scope thereof.

It is therefore intended by the appended claims to cover any and allsuch applications, modifications and embodiments within the scope of thepresent invention.

1. An adaptive searching system, said system comprising: a search enginefor receiving and processing search queries, the search engine utilizingan adaptive search algorithm; an interface device for communicating withthe search engine, the interface device providing a communication linkbetween a user providing a search query to the search engine; and aplurality of data sources; the search algorithm having an index of theplurality of data sources; wherein the search algorithm indexes theplurality of data sources by forming the data sources into a hypercubetopology, the hypercube topology including a plurality of cubesassociated with one or more data source, whereby data sources arearranged in proximity to other data sources based upon a similarity ofthe information possessed by each data source; whereby the search engineutilizes a plurality of message passing ants, each message passing antsearching the indexed plurality of data sources to answer the searchquery and depositing a signal of a path traversed, thereby allowingother message passing ants to follow the path taken by a previousmessage passing ant in response to the signal of the path traversed by aprevious message passing ant.
 2. The adaptive searching system of claim1 wherein each message passing ant provides a results message to thesearch engine.
 3. The adaptive searching system of claim 2 wherein asearch by a message passing ant of a cube is terminated when a searchresult is negative.
 4. The adaptive searching system of claim 1 furthercomprising: a plurality of corporate databases, each corporate databasestoring data related to a specific business enterprise; a businessintelligence engine having a process and rules protocol to determine atleast one corporate database providing information associated with thesearch query.
 5. The adaptive searching system of claim 1 furthercomprising a data discovery router for determining the data sources torespond to the search query from the user.
 6. An adaptive searchingalgorithm responding to a search query from a user through an interfacedevice, the algorithm comprising: a search engine for receiving andprocessing search queries; means for indexing a plurality of datasources; a plurality of message passing ants, each message passing antproviding a signal of a path followed in searching the plurality of datasources in response to the search query; the means for indexing aplurality of data sources includes utilizing a hypercube architecturehaving a plurality of hypercubes, each hypercube having a plurality ofnodes associated with the data sources; and the data sources beingindexed in a manner where data sources are positioned in proximity toeach other based on similarity of information of the data sources;whereby other message passing ants follow the signal deposited by aprevious message passing ant in response to the signal of the pathtraversed by a previous message passing ant while searching theplurality of data sources.
 7. The adaptive searching algorithm of claim6 wherein: a scoutmaster directs the plurality of message passing ants;whereby the message passing ants follow paths having a deposited signalin response to the search query.
 8. A method of adaptively searching aplurality of data sources within a network, the method comprising thesteps of: indexing the plurality of data source, wherein the step ofindexing the plurality of data sources includes arranging the datasources into a hypercube topology wherein each data source is positionedin proximity to another data source based on the similarity ofinformation possessed by each data source; sending a search query to asearch engine by a user; sending a plurality of message passing ants tothe data sources searching an answer to the search query; depositing asignal by a first message passing ant to indicate a path traversed bythe message passing ant during the search; determining by a secondmessage passing ant the path taken by the first message passing ant insearch of an answer to the search query; following in response to thesignal of the path traversed by a previous message passing ant, by thesecond message passing ant, the path of the first message passing ant toanswer the search query; and providing a response to the search query byat least one message passing ant searching the plurality of datasources.